Statistics and Data Science in the Insurance Industry

Nathan Lally: Data Scientist @ New England Statistical Society & HSB (Munich Re)
2019-04-21

About Me

Education
  • BA Political Science (UConn)
    • Study abroad in Mexico
  • BA Mathematics/Statistics (UConn)
    • Thesis: Predictive modeling for long term care insurance claims occurrence
  • MS Mathematics (UConn)
    • Thesis: Predicting damage to electrical infrastructure during hurricanes using Bayesian spatial modeling techniques
Career
  • General Dynamics Electric Boat
  • The Hartford Insurance Group
  • Pratt & Whitney
  • Hartford Steam Boiler (Munich Re)
Fun
  • Weight lifting
  • Skateboarding
  • Cats
  • Volunteering for the New England Statistical Society

About Me

Obligatory cat pictures

Tina

Tina

My Current Job

Hartford Steam Boiler (Munich Re)

Sr. Machine Learning Modeler

  • Responsibilities:
    • Core Product Pricing Models
    • IoT Product Development & Modeling
    • Claims analytics
    • Internal Training & Education
    • Mentoring Jr. Modelers

My Current Job

So what do I spend most of my time doing?

I build statistical and machine learning models to predict the price (premium) we should charge consumers for,

  • Commercial & personal lines equipment breakdown insurance
  • Commercial & personal lines service lines insurance
  • Employment practices liability insurance
  • And more to come!

So let me take you on a journey through the world of insurance product pricing. Excitement abounds at every turn!

A Primer on Insurance Product Pricing

Insurance is a strange business…

Most products or services are fairly easy to price.

\[ \begin{align} \text{Price} &= \text{Expenses} + \text{Desired Profit} \end{align} \]

This is a somewhat simplified model, but expenses (material, manufacturing, distribution, marketing, etc.) are typically fixed or reasonably easy to estimate.

What makes insurance products different?

A Primer on Insurance Product Pricing

Insurance is a strange business…

Insurance products are not easy to price.

\[ \begin{align} \text{Price} &= \text{Expenses} + \text{Desired Profit} \end{align} \]

where,

\[ \begin{align} \text{Expenses} &= \text{Loss Cost} + \text{Fixed Expenses} \end{align} \]

Loss cost is the amount of money an insurer will pay out for claims incurred by a insured for a given policy period. However, we do not know an insured's loss cost at point of sale, it is a random quantity. To price insurance we must estimate (predict) an insured's loss cost. We call this estimate the expected loss cost or sometimes pure premium (premium before expense and profit loading).

A Primer on Insurance Product Pricing

A little more on loss cost

It turns out loss cost itself has two components, each of which is a random quantity.

\[ \begin{align} \text{Loss Cost} &= \text{Claims Frequency} \cdot \text{Claims Severity} \end{align} \]

  • Claims Frequency: The expected count of claims per policy period
  • Claims Severity: The expected cost of an individual claim

teslamoney

A Very Traditional Statistical Approach to Product Pricing

Predicting loss cost the old way

In the days before digital computing and before many modern advances in statistics, simple methods were used to predict loss cost. Policyholder claims data would be aggregated by several explanatory/predictor variables into what are known as “ratings cells” and very basic statistics would be calculated to estimate loss cost.

\[ \begin{align} \text{Claims Frequency} &= \frac{\text{Claim Count}}{\text{Exposure}}\\ \text{Claims Severity} &= \frac{\text{Claim Cost}}{\text{Claim Count}}\\ \end{align} \]

The next slide shows an example of this methodology. The data used throughout this presentation is publicly available and comes from a major French auto insurer in 2004.

A Very Traditional Statistical Approach to Product Pricing

Predicting loss cost the old way

Is there anything suspect with this methodology?

Gender VehUsage binAge claim_count claim_cost exposure severity frequency loss_cost
Male Professional [17.9,25.9] 7 16523.17 13.367 2360.453 0.5236777 1236.1165
Male Professional (25.9,33.8] 104 187051.78 315.085 1798.575 0.3300697 593.6550
Male Professional (33.8,41.7] 115 278157.78 430.341 2418.763 0.2672299 646.3660
Male Professional (41.7,49.6] 135 309184.28 496.930 2290.254 0.2716680 622.1888
Male Professional (49.6,57.5] 147 275057.24 596.273 1871.138 0.2465314 461.2941
Male Professional (57.5,65.4] 81 156997.70 275.397 1938.243 0.2941209 570.0778
Male Professional (65.4,73.3] 20 46324.39 47.924 2316.219 0.4173274 966.6219
Male Professional (73.3,81.2] 8 24286.42 20.089 3035.802 0.3982279 1208.9412
Male Professional (81.2,89.1] 0 0.00 3.000 0.000 0.0000000 0.0000

A Very Traditional Statistical Approach to Product Pricing

Predicting loss cost the old way

It turns out that choosing meaningful rating cells and estimating their associated loss costs is more of an art than a science. Actuaries would need to turn to intuition and assumptions choose the variables that define ratings cells and to adjust values that did not seem reasonable, especially for loss cost estimates where exposure is very limited.

tacoma

Bad actuarial assumptions are somewhat less dangerous than bad engineering assumptions. This is the famous Tacoma Narrows Bridge otherwise known as “Galloping Gertie”.

A Better Statistical Approach to Product Pricing

Predicting loss cost in the 90s

Fortunately for the insurance industry, statisticians continued to develop useful models and methods throughout the 20th century (they are still at it trust me). Actuaries and other insurance professionals would begrudgingly begin to use these models for product pricing; generally a decade or two after their introduction.

The insurance industry modernizing

The image below depicts an actuary fighting with her managers after being told to use R for statistical modeling rather than continuing to create tables in Excel.

Tina

Just kidding. It is a stock photo from the BLS Occupational Outlook Handbook site on actuarial careers.

A Better Statistical Approach to Product Pricing

Predicting loss cost with generalized linear models

One such statistical model is the generalized linear model (GLM). GLMs were developed in the late 1970s and became popular in insurance pricing applications in the 1990s. GLMs and their extensions are still used to this day in insurance pricing.

\[ \begin{align} \mathbb{E}[Y|\pmb{x}] &= g^{-1}\left(\beta_0 + \pmb{x}'\pmb{\beta}\right) \end{align} \]

The outcome or dependent variable \( Y \) is assumed to be generated from a distribution in the exponential family (more on that in a bit), the row vector \( \pmb{x} \) encodes information from a set of predictor variables, \( \beta_0 \) is a called the model intercept, and the vector \( \pmb{\beta} \) represents the regression weights associated with the predictor variables.

A Better Statistical Approach to Product Pricing

Advantages of the GLM over spreadsheet methods
  • Probability models enable richer inference about the loss generating process
  • Ability to make valid inference about both main effects and interaction effects
  • Rigorous statistical methods to do model and variable selection
  • GLMs can accommodate continuous predictor variables (no need for binning)
  • Borrow strengths to inform estimates where you have little data (exposure)

A Better Statistical Approach to Product Pricing

Typical Insurance Assumptions

Let \( N \) be a random variable representing claims counts and \( Z \) be a random variable representing claims costs. A popular (and generally useful) assumption in insurance pricing is that realizations of \( N \) are generated by a Poisson distribution and \( Z \) a gamma distribution. \[ \begin{align} f(n) &= \lambda^{n}\frac{e^{-\lambda}}{n!} \ \text{for } n \ge 0\\ f(z) &= \frac{\beta^\alpha}{\Gamma(\alpha)}z^{\alpha-1}e^{-\beta z}\ \text{for } z > 0\\ \end{align} \]

Probability Mass and Density Functions

plot of chunk pmfpdf

A Better Statistical Approach to Product Pricing

Two models is one too many

For years, the most popular way to model loss cost was to model claims frequency and claims severity separately with two unique models. The claims frequency model would be fit with data from all available policies while the claims severity model would be fit to only the data where claims had occurred.

  • Poisson GLM \[ \begin{equation} \lambda_i = e^{\left( \alpha_0 + \pmb{x}_i'\pmb{\alpha} + \log(c_i)\right)} \end{equation} \]

  • Gamma GLM \[ \begin{equation} \theta_i = e^{\left( \beta_0 + \pmb{x}_i'\pmb{\beta}\right)} \end{equation} \]

  • Loss Cost \[ \begin{equation} \mu_i = \lambda_i \theta_i \end{equation} \]

A Better Statistical Approach to Product Pricing

One model to rule them all

As I said, we are fortunate statisticians don't stop thinking. There has to be some distribution out there that can model loss cost directly rather than requiring two sub-models. In fact, such distributions have been discovered. Perhaps the most appropriate for this application is the compound Poisson-gamma distribution.

\[ \begin{align} N &\sim \text{Poisson}(\lambda)\\ Z &\sim \Gamma(\alpha, \beta)\\ Y &= \sum_{i=1}^N Z_i \end{align} \]

Which is a special case of the exponential dispersion model called the Tweedie distribution. Believe it or not, something this dry revolutionized insurance pricing.

A Better Statistical Approach to Product Pricing

That was a lot of math. Let’s show a practical example…

OK let's translate all that mess into something straight forward. For the Tweedie GLM,

  • Expected loss cost for a risk with a policy period of duration \( c_i \)

\[ \begin{align} \mu_i &= e^{\beta_0 + \pmb{x}_i'\pmb{\beta} + \log(c_i)} = e^{\beta_0 + \pmb{x}_i'\pmb{\beta}}\cdot c_i \end{align} \]

  • Base loss cost for all insureds

\[ \begin{align} \text{base loss cost} &= e^{\beta_0} \end{align} \]

  • Multiplicative adjustment to the base loss cost for a given risk

\[ \begin{align} \text{adjustment factor} &= e^{\pmb{x}_i'\pmb{\beta}} \end{align} \]

It's even easier to understand with pictures though. The next several slides illustrate the results of a Tweedie GLM fit to the French auto claims data.

A Better Statistical Approach to Product Pricing

Loss cost relativities: Gender

plot of chunk gendrel

A Better Statistical Approach to Product Pricing

Loss cost relativities: Vehicle Usage

plot of chunk vehrel

A Better Statistical Approach to Product Pricing

Loss cost relativities: Driver Age

plot of chunk agerel

A Better Statistical Approach to Product Pricing

GLMs have their limitations

The Tweedie GLM is still a common model used to predict loss cost in the insurance industry. It can be used to produce ratings plans that are easy to interpret. However it is not without its limitations including but not limited to,

  • All potential relationships between the predictor variables and expected loss cost need to be defined explicitly
  • Complicated non-linear relationships may be difficult to model adequately
  • Interaction terms need to be defined explicitly

When dealing with potentially thousands of variables this can be quite cumbersome.

A Better Statistical Approach to Product Pricing

Non-linear relationship

plot of chunk nonlin

Interaction effect

plot of chunk interact

A Statistical Learning Approach to Product Pricing

Introduction to statistical learning

Statistical learning is a branch of (some would argue synonym for) machine learning (ML) that uses statistical theory and algorithms to automatically discover patterns and relationships in data. After learning from observed data, statistical learning models can make predictions about future events without explicit instructions from a human programmer.

Machine learning can be viewed as a subset of artificial intelligence (AI).

Some machines learn too much

Arnold

A Statistical Learning Approach to Product Pricing

Introduction to statistical learning

To predict insurance loss costs we use what are called supervised learning algorithms. Supervised learning methods attempt to learn functions that map input information to outputs.

\[ \begin{align} y_i &= f(\pmb{x}_i) + \epsilon_i \end{align} \]

In our example we estimate a function that maps predictor variable values to expected auto insurance loss cost.

\[ \begin{align} \hat{y_i} &= \widehat{f}(\text{License Age}_i,...,\text{Max Speed}_i) \end{align} \]

A Statistical Learning Approach to Product Pricing

Gradient boosting machines

We will use gradient boosting machines (GBM) with a Tweedie loss function to build a predictive model for loss cost.

GBM

Trust me, GBMs are interesting and work very well for insurance pricing data…

A Statistical Learning Approach to Product Pricing

Variable importance

The GBM can give provide us with an assessment of variable importance. These are the variables that when their values change, have the largest impact on change in predictions.

plot of chunk varimp

The following slides show the marginal effects of each predictor variable on predicted loss cost.

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd1

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd3

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd2

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd4

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd5

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd7

A Statistical Learning Approach to Product Pricing

Partial dependence

plot of chunk pd6

A Statistical Learning Approach to Product Pricing

OK that’s enough slides for a bit. Let’s turn this model into something useful.

rshiny

Beyond Modeling: The Role of the Modern Data Scientist

We don’t just fit models

Data science is a rapidly changing field. In addition to fitting statistical and machine learning models, a data scientist must be familiar with,

  • Database design and programming languages
  • Software engineering principles
    • OOP, program design, version control, DevOps
  • Presenting findings to non-technical audiences
At the center of every Venn diagram

doitall

For the Aspiring Data Scientist

How to become a data scientist
  1. Complete an undergraduate degree with a strong quantitative focus
    • Statistics, Mathematics, Computer Science, Physics, Electrical Engineering, Economics (focus on econometrics)
  2. Complete a graduate degree
    • PhD or MS in Statistics, Computer Science or a related field with relevant coursework and research
  3. Learn to code
    • R, Python, Java, Scala, SQL
  4. Show that you can do useful things
    • Complete internships
    • Publish research
    • Open a GitHub account and create cool things

For the Aspiring Data Scientist

Some resources
  • Coursera: https://www.coursera.org/
    • Free and paid online courses which includes topics in math, statistics, data science, computer science and more
  • Data Camp: https://www.datacamp.com/
    • Free and paid online courses with an emphasis on practical data science in R and Python
  • GitHub: https://github.com/
    • Host and manage git repositories
  • New England Statistical Society: https://nestat.org/
    • We're just getting started, but we will start providing education materials for aspiring data scientists as well